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Abstract 

We show how to compactly represent any n-dimensional subspace of R™ as a 
banded product of Householder reflections using n{m — n) floating point num- 
bers. This is optimal since these subspaces form a Grassmannian space Gr„(m) 
of dimension n(rn — n). The representation is stable and easy to compute: any 
matrix can be factored into the product of a banded Householder matrix and a 
square matrix using two to three QR decompositions. 
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If m > n, the Householder QR algorithm represents an to x n orthogonal 
matrix U as a product of n Householder reflections using a total oi n{m— {n + 
l)/2) floating point numbers [TJ chap. 5]. However, in some applications only 
the range of U is important; any other orthogonal matrix with the same range 
is equivalent. One example is the hierarchically semiseparable representations 
of [5, where a tree of orthogonal matrices is used to compress matrices with 
significant offdiagonal structure. Since the n-dimensional subspaces of R™ form 
a Grassmannian manifold Gr„(TO) of dimension n{m — n), we expect that some 
orthogonal matrix with the correct range can be represented with n{m — n) 
floats. The following theorem provides such a representation: 

Theorem 1. If m> n, any A g R™^" can be factored as 



where B G square and G is a product of n Householder reflections with 
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banded structure: 
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Since each Vi has m — n + 1 nonzero components, the first of which is 1, G can 
be stored in n{m — n) floats. If A is full rank, the matrices G and B are unique, 
although the Householder vectors Vi may not be. 

Proof. Observe that ([T]) is exaetly the factored form produced by standard 
Householder QR except for the traihng zeroes in each Vi, which correspond to 
the extreme lower triangle i > j + m — n. To introduce these zeroes, define A'^ 
as A rotated by 180° (pronounced "flip A"), 

— 4 

-^ij ^m— 2+1, n— 2+1 

and perform an LQ decomposition of A^: 

A^ Q 

A =L<>Q'> 

Since Lij = for i < j, Lfj = for i > j + m — n. The Householder QR 
algorithm constructs Vi as a hnear combination of and the ith column of the 
matrix (after rotation by the previous Householder reflections), and the first 
component of each vector can be chosen to be 1 [U chap. 5]. Therefore, a 
Householder QR decomposition 
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will produce G with the correct banded structure. Our final factorization is 
A^G[ =G 



B 




The steps are visualized in Figure [T] Uniqueness of G and B follows from the 
uniqueness of the QR decomposition when A is full rank [TJ chap. 5]. Note that 
B is orthogonal whenever A is orthogonal. □ 

Since the construction uses only matrix multiply and Householder QR de- 
composition as primitives, the computation of G is stable and requires 0{mn'^) 
flops. Normally the scalar factors /3i — 2/{v[vi) will be precomputed and stored 
for an additional n floats of storage. Once Pi is available, a single Householder 
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Figure 1; To compute a banded Householder factorization of a matrix A, we perform an LQ 
factorization of A rotated by 180° (denoted A'^) to zero the extreme lower triangle of A, then 
perform a QR decomposition to zero the upper triangle and construct the banded Householder 
vectors Vi. 



reflection requires 4(m — n) + 2 flops and matrix vector products Gx or G"^?/ 
can be computed in 4n(m — n) + 2n flops. If blocking is desired, we can rep- 
resent products of h Householder reflections where h is the block size using the 
I—VTV^ representation [3], which involves a relatively small increase in storage 
and flops if 6 <C m — n. 

Unfortunately, the banded Householder representation in Theorem [l] is in- 
eSicient for large subspaces of R™, since Gx involves a large number of small 
level 1 BLAS operations if to — n is small. To remedy this problem, we can rep- 
resent a large n-dimensional subspace in terms of its small (m — n)-dimensional 
orthogonal complement as follows: 

Theorem 2. If m> n, any A e R™^" can be factored as 





where B G j^"X" jg square and G is a handed product of m — n Householder 
reflections with n + 1 nonzero components per vector, the first of which is 1 
(equivalent to ^ with m — n and n swapped). In particular, G can also be 
stored in n{m — n) floats. 

Proof. Perform a QR decomposition of A to get 

A={Ui C/2 ) Q ^ = C/ii? 

Here the column span of Ui G R™^" contains the range of A, and the span of 
U2 & Rnix(™-n) ig contained inside the nullspace of A'^ . Banded Householder 
factorization of U2 gives 

C/2 = G [ ^ ] = ( Gi G2 ) ( 2 ) - GiQ 
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Figure 2: Frames from an animation of a digital character using blend shapes stored in a 
hierarchically semiseparable representation (HSS). Using banded Householder form for the 
rotations in the HSS tree reduces the storage costs by 45.7% over dense storage, or 29.5% of 
Householder storage. 



whence Gi = U^Q^ and 



A = 
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since U2 A = J7jC/ii? = 0. Our final decomposition is 



A^G 



GlA 



G 




B 



□ 



Using Theorem [T] for m—n > n and Theorem[2]for m—n < n, the resuhing G 
consists of at most ni/2 Householder vectors each with at least m/2 + 1 nonzero 
components. In particular, the blocked / — VTV'^ form is efficient whenever 
6 <§; m, n, regardless of the value of m — n. 



1. Application 

Our motivating application for the banded Householder decomposition is 
the compression of blend shape matrices for digital characters. We start with a 
large, mostly dense matrix where each column represents a pose of the digital 
character mesh. An example face with 42391 vertices and 730 blend shapes is 
shown in Figure [2] The original matrix consumes 348 MB of storage in sin- 
gle precision. To reduce this, we compute a lossy hierarchically semiseparable 
(HSS) representation for the matrix [2], which represents a matrix as a tree of 
rotations and dense blocks. If the rotations are stored in dense form, the HSS 
representation requires 46.8 MB of storage. Using Householder form reduces 
the storage cost to 36.0 MB (77.7% of dense), and banded Householder form re- 
duces the cost further to 25.4 MB (54.3% of dense). On an 8 core Intel Xeon 2.8 
GHz machine, the cost to multiply the HSS representation with a vector is 11.2 



4 



ms using dense storage with optimized BLAS and 10.7 ms using banded House- 
holder storage with handwritten, unvectorized C. Since the required memory 
trafSc in the banded Householder case is roughly half that of the dense case, we 
expect this comparison would improve significantly if the banded Householder 
code were appropriately vectorized. 
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